Automating RDF Dataset Transformation and Enrichment
نویسندگان
چکیده
With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.
منابع مشابه
Full Syntactic Parsing for Enrichment of RDF dataset
RDF data extracted automatically often contain long textual literals. This paper shows how to use natural language processing techniques to automatically generate specific RDF triples from the information in the literals. We look specifically at drug indications found in the DailyMed dataset. We develop knowledge schemas to capture its information as well as precise syntactic-based methods of k...
متن کاملTesting OWL Axioms against RDF Facts: A Possibilistic Approach
Automatic knowledge base enrichment methods rely critically on candidate axiom scoring. The most popular scoring heuristics proposed in the literature are based on statistical inference. We argue that such a probability-based framework is not always completely satisfactory and propose a novel, alternative scoring heuristics expressed in terms of possibility theory, whereby a candidate axiom rec...
متن کاملBiotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data
BACKGROUND The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. ...
متن کاملdatos.bne.es: a Library Linked Data Dataset
We describe the datos.bne.es library dataset, which makes available the authority and bibliography catalogue from the Biblioteca Nacional de España (BNE, Spanish National Library) as Linked Data. The catalogue contains around 7 million authority and bibliographic records. The records in MARC 21 format were transformed to RDF and modelled using IFLA ontologies. A tool named MARiMBA automatize th...
متن کاملTowards Sustainable Extract-Transform-Load Fusion of Company Data
Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) systems which perform offline transformation, ontology matching and linking techniques usually takes many iterations of revisions...
متن کامل